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What is Claimed: 

1 1. A chip multiprocessor (CMP) comprising 

2 a plurality of processors disposed on a peripheral region of a chip, each 

3 processor including 

4 (a) a dual datapath for executing instructions, 

5 (b) a compiler controlled register file (RF), coupled to the dual datapath, 

6 for holding operands of an instruction, and 

7 (c) a compiler controlled local memory (LM), a portion of the LM disposed 

8 to a left of the dual datapath and another portion of the LM disposed to a right of the dual 

9 datapath, for holding operands of an instruction, 

10 a shared main memory disposed at a central region of the chip, 

11 a crossbar system for coupling the shared main memory to each of the 

12 plurality of processors, and 

13 a first-in-first-out (FIFO) system for transferring operands of an instruction 

14 among multiple processors of the plurality of processors. 

1 2. The CMP of claim 1 wherein 

2 the shared main memory includes embedded DRAM disposed in the central 

3 region of the chip, and 

4 the crossbar system is disposed above the embedded DRAM. 
i 3. The CMP of claim 1 wherein 
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2 each processor includes at least one data port for accessing the shared main 

3 memory, 

4 the shared main memory includes a plurality of pages of embedded DRAM, 

5 and 

6 the crossbar system includes a plurality of horizontal and vertical busses, 

7 each horizontal bus coupled to at least one page of embedded DRAM and each vertical bus 

8 coupled to a different port of the plurality of processors. 

1 4. The CMP of claim 3 wherein 

2 each of the horizontal and vertical busses is configured as a split-transaction 

3 bus, in which data on each bus flows in only one direction at a time. 

1 5. The CMP of claim 1 wherein 

2 the compiler controlled RF is coupled to a data port for accessing the shared 

3 main memory, 

4 an instruction cache is coupled to an instruction port for accessing the 

5 shared main memory, 

6 the portion of the LM disposed to the left of the dual datapath and the other 

7 portion of the LM disposed to the right of the dual datapath are each coupled to a data port 

8 for accessing the shared main memory, and 

9 the data port of the RF, the instruction port of the instruction cache, and the 

10 data ports of the portions of the LM are separate ports configured to separately access the 

11 shared main memory. 

l 6. The CMP of claim 1 wherein 



2 



the FIFO system includes registers mapped to registers located in the RF. 
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7. The CMP of claim 1 wherein 

the FIFO system includes a plurality of registers, each register configured to 
store data by a respective processor for destination to another processor. 

8. The CMP of claim 1 wherein 

the portion of the LM disposed to the left of the datapath and the other 
portion of the LM disposed to the right of the datapath each includes a level-one memory 
of a predetermined size, and 

the predetermined size is a variable size predetermined by the compiler. 

9. The CMP of claim 1 wherein the LM and the RF are level-one 
memories and the shared main memory is a level-two memory, and 

the CMP is free-of an automatic data cache. 

10. The CMP of claim 1 wherein 

the shared main memory includes an embedded DRAM and 

a double buffered sense amplifier for overlapping a next fetch with a current 
read operation. 

11. A chip multiprocessor (CMP) comprising 

first, second, third and fourth clusters of processors disposed on a peripheral 
region of a chip, each of the clusters of processors disposed at a different quadrant of the 
peripheral region of the chip, and each including a plurality of processors for executing 
instructions, 
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6 first, second, third and fourth clusters of embedded DRAM disposed in a 

7 central region of the chip, each of the clusters of embedded DRAM disposed at a different 

8 quadrant of the central region of the chip, and 

9 first, second, third and fourth crossbars, respectively, disposed above the 

10 clusters of embedded DRAM for coupling a respective cluster of processors to a respective 
n cluster of embedded DRAM, 

12 wherein a memory load/store instruction is executed by at least one 

13 processor in the clusters of processors by accessing at least one of the first, second, third 

14 and fourth clusters of embedded DRAM by way of at least one of the first, second, third 

15 and fourth crossbars. 

1 12. The CMP of claim 11 wherein 

2 each of the plurality of processors of each of the clusters of processors 

3 includes a plurality of data ports, each configured to access the at least one of the clusters 

4 of embedded DRAM. 

1 13. The CMP of claim 11 wherein 

2 each of the crossbars includes horizontal and vertical busses, 

3 the vertical busses of the first, second, third and fourth crossbars, 

4 respectively, coupled to ports of the processors of the first, second, third and fourth 

5 clusters of processors, and 

6 the horizontal busses of the first, second, third and fourth crossbars, 

7 respectively, coupled to pages of the first, second, third and fourth clusters of embedded 

8 DRAM. 

i 14. The CMP of claim 13 wherein 
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2 inter-cluster horizontal arbitrators are coupled between inter-cluster ports 

3 attached to the horizontal busses of the first and second crossbars, and between inter- 

4 cluster ports attached to the horizontal busses of the third and fourth crossbars, and 

5 inter-cluster vertical arbitrators are coupled between inter-cluster ports 

6 attached to the vertical busses of the first and third crossbars, and between inter-cluster 

7 ports attached to the vertical busses of the second and fourth crossbars. 

1 15. The CMP of claim 11 wherein 

2 a FIFO system is configured to couple processors in a cluster of processors 

3 for transferring operands of an instruction between the processors in the cluster. 

1 16. The CMP of claim 15 wherein 

2 the FIFO system includes a plurality of input FIFOs and a single output FIFO 

3 assigned to a processor in the cluster, 

4 the input FIFOs configured to store data transferred from other processors in 

5 the cluster to the processor in the cluster, and 

6 the output FIFO configured to store data transferred from the processor to 

7 the other processors in the cluster. 

1 17. The CMP of claim 11 wherein 

2 each of the processors in each cluster includes a compiler controlled local 

3 memory (LM), a portion of the LM disposed to a left of each processor and another portion 

4 of the LM disposed to a right of each processor for holding operands of an instruction. 

1 18. The CMP of claim 17 wherein 

2 the LM includes a predetermined number of pages of memory, and 
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the number of pages of the LM assigned to neighboring CPUs is a task- 
dependent variable determined by the compiler. 

19. The CMP of claim 17 wherein 

the LM includes at least one port configured to access at least one of the 
clusters of embedded DRAM. 

20. The CMP of claim 19 wherein 

the LM is configured to receive streaming media data, stored in the at least 
one cluster of embedded DRAM, via at least one port. 



